An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Rewards

نویسندگان

Matt Hoffman

Nando de Freitas

Arnaud Doucet

Jan Peters

چکیده

We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than closed-form solutions, such as the widely used linear quadratic Gaussian (LQG) controllers. On the other hand, it is more accurate and faster than optimization methods that rely on approximation and simulation. Partial analytical solutions (though costly) eliminate the need for simulation and, hence, avoid approximation error. The experiments will show that for the same cost of computation, policy optimization methods that rely on analytical tractability have higher value than the ones that rely on simulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision making with inference and learning methods

In this work we consider probabilistic approaches to sequential decision making. The ultimate goal is to provide methods by which decision making problems can be attacked by approaches and algorithms originally built for probabilistic inference. This in turn allows us to directly apply a wide variety of popular, practical algorithms to these tasks. In Chapter 1 we provide an overview of the gen...

متن کامل

An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward

متن کامل

Probabilistic inference for solving (PO)MDPs

The development of probabilistic inference techniques has made considerable progress in recent years, in particular with respect to exploiting the structure (e.g., factored, hierarchical or relational) of discrete and continuous problem domains. We show that these techniques can be used also for solving Markov Decision Processes (MDPs) or partial observable MDPs (POMDPs) when formulated in term...

متن کامل

On Solving General State-Space Sequential Decision Problems using Inference Algorithms

A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. For example, Toussaint et al (2006) uses EM with optimal smo...

متن کامل

Multistage Markov Decision Processes with Minimum Criteria of Random Rewards

We consider multistage decision processes where criterion function is an expectation of minimum function. We formulate them as Markov decision processes with imbedded parameters. The policy depends upon a history including past imbedded parameters, and the rewards at each stage are random and depend upon current state, action and a next state. We then give an optimality equation by using operat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Rewards

نویسندگان

چکیده

منابع مشابه

Decision making with inference and learning methods

An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward

Probabilistic inference for solving (PO)MDPs

On Solving General State-Space Sequential Decision Problems using Inference Algorithms

Multistage Markov Decision Processes with Minimum Criteria of Random Rewards

عنوان ژورنال:

اشتراک گذاری